Skip to content

Global-knot spline LSQ with matrix-free CGLS#150

Open
krystophny wants to merge 2 commits intomainfrom
feature/lsq_scattered_interpolate
Open

Global-knot spline LSQ with matrix-free CGLS#150
krystophny wants to merge 2 commits intomainfrom
feature/lsq_scattered_interpolate

Conversation

@krystophny
Copy link
Copy Markdown
Member

@krystophny krystophny commented Dec 2, 2025

User description

This PR implements a global-knot, matrix-free CGLS-based least-squares spline fitting pipeline for 1D/2D/3D and batch 1D splines.

Highlights:

100% tests passed, 0 tests failed out of 44

Label Time Summary:
biot_savart = 9.04 secproc (1 test)
build-helper = 0.02 sec
proc (1 test)
field = 0.70 secproc (10 tests)
magfie = 53.77 sec
proc (4 tests)
poincare = 0.02 secproc (1 test)
polylag = 0.01 sec
proc (1 test)
tilted_coil = 44.72 sec*proc (3 tests)

Total Test time (real) = 76.80 sec (ctest), including new LSQ and CGLS suites.


PR Type

Enhancement


Description

  • Implement matrix-free CGLS solvers for small and dense systems

    • cgls_small.f90: Fixed-size solver for normal equations (ISO Fortran 2018)
    • cgls_dense.f90: General dense solver with OpenMP/SIMD optimization and optional weights
  • Add global-knot LSQ spline constructors for 1D/2D/3D and batch 1D

    • Design matrix builders for each dimension using basis function evaluation
    • LSQ constructors that solve for knot values via CGLS, preserving spline continuity
  • Extend test coverage with new CGLS solver tests and LSQ spline validation

    • Tests for scattered data interpolation in 1D/2D/3D
    • Tests for masked domains (circular, spherical, toroidal)
    • Global-knot design matrix exactness verification

Diagram Walkthrough

flowchart LR
  A["Scattered Data<br/>x_data, f_data"] --> B["Design Matrix<br/>build_design_matrix_*d"]
  B --> C["CGLS Solver<br/>cgls_dense_solve"]
  C --> D["Global Knots<br/>y_knots"]
  D --> E["Spline Coefficients<br/>construct_splines_*d"]
  E --> F["Fitted Spline<br/>SplineData*D"]
Loading

File Walkthrough

Relevant files
Enhancement
4 files
cgls_small.f90
New small fixed-size CGLS solver module                                   
+99/-0   
cgls_dense.f90
New dense matrix CGLS solver with OpenMP/SIMD                       
+212/-0 
interpolate.f90
Add LSQ constructors and design matrix builders for 1D/2D/3D
+405/-2 
batch_interpolate_1d.f90
Add batch LSQ constructor and basis unit allocator             
+117/-1 
Tests
5 files
test_cgls_small.f90
New tests for small CGLS solver on SPD matrices                   
+96/-0   
test_cgls_dense.f90
New tests for dense CGLS solver weighted/unweighted           
+97/-0   
test_interpolate.f90
Add LSQ spline tests for 1D/2D/3D scattered and masked domains
+456/-2 
test_batch_interpolate.f90
Add batch 1D LSQ scattered data interpolation test             
+91/-1   
test_spline_cgls_global.f90
New test for global-knot design matrix CGLS exactness       
+109/-0 
Configuration changes
2 files
CMakeLists.txt
Register new CGLS solver modules in build                               
+2/-0     
CMakeLists.txt
Register new CGLS and global-knot test executables             
+12/-0   

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review bot commented Dec 2, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟡
🎫 #1
🟢 Minimize reliance on Fortran features beyond F95 where possible.
🔴 Make MPI entirely optional in low-level routines like arnoldi via preprocessor or remove
MPI usage from such routines.
Reduce dependencies on external libraries where feasible.
🟡
🎫 #4
🔴 Implement a scanner to handle Fortran module dependencies for Ninja dyndep integration.
Ensure Ninja build uses dyndep for accurate Fortran module build ordering.
🟡
🎫 #3
🔴 Provide latest versions of magnetic field routine variants across multiple coordinate
systems and sources.
🟡
🎫 #2
🔴 Integrate, document, and test VMEC interfaces, potentially requiring NetCDF/HDF5.
Unify VMEC field routines across codes, incorporating specific fixes and features.
Add automated unit tests for VMEC-related routines and derivatives comparisons.
Add documentation (API/user docs) for VMEC routines.
Various TODOs including stopping empty NetCDF creation, tests for derivatives and fields
vs libstell, pfUnit automation, coverage, GitHub actions, adding INTENTs, and Doxygen
docs.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit logs: The new numerical routines perform computations and error stops without any audit logging,
but given this is a numerical library layer with no sensitive actions, it likely does not
require audit trails; confirm at a higher layer if critical actions need logging.

Referred Code
subroutine cgls_dense_core(phi, rhs, x, max_iter, tol)
    !! Conjugate Gradient for normal equations using
    !! explicit matrix-vector products (unweighted case).
    real(dp), intent(in) :: phi(:,:)
    real(dp), intent(in) :: rhs(:)
    real(dp), intent(inout) :: x(:)
    integer, intent(in) :: max_iter
    real(dp), intent(in) :: tol

    integer :: iter, n_rows, n_cols
    real(dp) :: gamma, gamma_new, alpha, beta, denom
    real(dp) :: rhs_norm

    real(dp), allocatable :: r(:), s(:), p(:), q(:)

    n_rows = size(phi, 1)
    n_cols = size(phi, 2)

    if (size(rhs) /= n_rows) then
        error stop "cgls_dense_core: rhs size mismatch"
    end if


 ... (clipped 109 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Error stops only: New constructors use error stop on input mismatches and empty datasets without contextual
recovery or status returns, which may be acceptable for a library/test context but should
be reviewed if used in production flows expecting graceful handling.

Referred Code
subroutine construct_splines_1d_lsq(x_min, x_max, order, periodic, &
    num_points, x_data, f_data, spl, weights)
    real(dp), intent(in) :: x_min, x_max
    integer, intent(in) :: order
    logical, intent(in) :: periodic
    integer, intent(in) :: num_points
    real(dp), intent(in) :: x_data(:)
    real(dp), intent(in) :: f_data(:)
    real(dp), intent(in), optional :: weights(:)
    type(SplineData1D), intent(out) :: spl

    integer :: n_data, n_keep, i
    real(dp), allocatable :: x_used(:), f_used(:), w_used(:)
    real(dp), allocatable :: phi(:,:), y(:)
    real(dp) :: h_ref, tol_x
    logical :: use_direct

    n_data = size(x_data)
    if (n_data /= size(f_data)) then
        error stop "construct_splines_1d_lsq: x_data and f_data size mismatch"
    end if


 ... (clipped 74 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review bot commented Dec 2, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Avoid large temporary memory allocations

To reduce memory usage in the weighted least-squares solver, modify the
matrix-vector product routines in cgls_dense.f90 to apply weights dynamically
instead of creating large temporary copies of the design matrix and right-hand
side.

Examples:

src/interpolate/cgls_dense.f90 [190-209]
        if (present(w)) then
            allocate(phi_eff(n_rows, n_cols))
            allocate(rhs_eff(n_rows))

            !$omp parallel do default(shared) private(i, j)
            do i = 1, n_rows
                rhs_eff(i) = w(i)*rhs(i)
                !$omp simd
                do j = 1, n_cols
                    phi_eff(i, j) = w(i)*phi(i, j)

 ... (clipped 10 lines)

Solution Walkthrough:

Before:

subroutine cgls_dense_solve(phi, rhs, x, w, ...)
    ...
    if (present(w)) then
        ! Allocate full copies of phi and rhs
        allocate(phi_eff(n_rows, n_cols))
        allocate(rhs_eff(n_rows))

        ! Populate weighted copies
        do i = 1, n_rows
            rhs_eff(i) = w(i)*rhs(i)
            do j = 1, n_cols
                phi_eff(i, j) = w(i)*phi(i, j)
            end do
        end do

        call cgls_dense_core(phi_eff, rhs_eff, x, ...)
        deallocate(phi_eff, rhs_eff)
    else
        call cgls_dense_core(phi, rhs, x, ...)
    end if
end subroutine

After:

subroutine cgls_dense_solve(phi, rhs, x, w, ...)
    ...
    ! No large temporary allocations for weights
    call cgls_dense_core(phi, rhs, x, w, ...)
end subroutine

subroutine cgls_dense_core(phi, rhs, x, w, ...)
    ...
    ! Initial residual calculation
    if (present(w)) then
        r = w * rhs
    else
        r = rhs
    end if

    ! Matrix-vector products apply weights on-the-fly
    call matvec_phi_t_core(phi, r, s, w) ! Computes s = phi^T * (w*r)
    ...
    loop
        call matvec_phi_core(phi, p, q, w) ! Computes q = (w*phi) * p
        ...
        call matvec_phi_t_core(phi, r, s, w) ! Computes s = phi^T * (w*r)
        ...
    end loop
end subroutine
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant memory allocation issue in cgls_dense_solve that could impact performance with large datasets, and the proposed on-the-fly weighting is a valid and impactful optimization.

Medium
Possible issue
Correct the solver's stopping condition

Change the relative stopping condition in cgls_solve_small to an absolute
tolerance check to prevent premature convergence.

src/interpolate/cgls_small.f90 [85]

-if (r_norm2_new <= tol*r_norm2) exit
+if (r_norm2_new <= tol) exit
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potentially problematic relative stopping criterion in an iterative solver and proposes a more robust absolute tolerance check, which prevents premature termination and improves correctness.

Medium
General
Refactor duplicated data filtering logic

Refactor the two-loop data filtering logic in the new ..._lsq subroutines into a
more efficient single loop by pre-allocating arrays and resizing with
move_alloc.

src/interpolate/interpolate.f90 [242-272]

 subroutine construct_splines_1d_lsq(x_min, x_max, order, periodic, &
     num_points, x_data, f_data, spl, weights)
     ...
-    n_keep = 0
-    do i = 1, n_data
-        if (.not. periodic) then
-            if (x_data(i) < x_min .or. x_data(i) > x_max) cycle
-        end if
-        if (present(weights)) then
-            if (weights(i) == 0.0_dp) cycle
-        end if
-        n_keep = n_keep + 1
-    end do
-
-    if (n_keep == 0) then
-        error stop "construct_splines_1d_lsq: no usable data"
-    end if
-
-    allocate(x_used(n_keep), f_used(n_keep))
-    if (present(weights)) allocate(w_used(n_keep))
+    allocate(x_used(n_data), f_used(n_data))
+    if (present(weights)) allocate(w_used(n_data))
 
     n_keep = 0
     do i = 1, n_data
         if (.not. periodic) then
             if (x_data(i) < x_min .or. x_data(i) > x_max) cycle
         end if
         if (present(weights)) then
             if (weights(i) == 0.0_dp) cycle
             w_used(n_keep + 1) = weights(i)
         end if
         n_keep = n_keep + 1
         x_used(n_keep) = x_data(i)
         f_used(n_keep) = f_data(i)
     end do
+
+    if (n_keep == 0) then
+        error stop "construct_splines_1d_lsq: no usable data"
+    end if
+
+    if (n_keep < n_data) then
+        call move_alloc(from=x_used(1:n_keep), to=x_used)
+        call move_alloc(from=f_used(1:n_keep), to=f_used)
+        if (present(weights)) then
+            call move_alloc(from=w_used(1:n_keep), to=w_used)
+        end if
+    end if
     ...
 end subroutine construct_splines_1d_lsq

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a repeated and inefficient two-pass filtering pattern across multiple new subroutines and proposes a valid, more performant single-pass approach.

Low
  • Update

@krystophny krystophny force-pushed the feature/lsq_scattered_interpolate branch from 570f11a to 29c0fc9 Compare January 11, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant